Overview

Dataset statistics

Number of variables10
Number of observations264937
Missing cells63562
Missing cells (%)2.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory38.3 MiB
Average record size in memory151.5 B

Variable types

NUM8
CAT1
DATE1

Reproduction

Analysis started2020-04-12 18:54:09.404436
Analysis finished2020-04-12 19:01:46.277039
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
$SO_2$ (µg/m3) has 12048 (4.5%) missing values Missing
CO (ppm) has 8464 (3.2%) missing values Missing
$O_3$ (µg/m3) has 10403 (3.9%) missing values Missing
$PM_{10}$ (µg/m3) has 14719 (5.6%) missing values Missing
$NO_2$ (µg/m3) has 8924 (3.4%) missing values Missing
NO (µg/m3) has 9004 (3.4%) missing values Missing
$SO_2$ (µg/m3) is highly skewed (γ1 = 34.17882096) Skewed
$SO_2$ (µg/m3) has 78999 (29.8%) zeros Zeros
CO (ppm) has 11044 (4.2%) zeros Zeros

Variables

Distinct count43824
Unique (%)16.5%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
Minimum2011-01-01 00:00:00
Maximum2015-12-31 23:00:00
Histogram

$SO_2$ (µg/m3)
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS
Distinct count15119
Unique (%)6.0%
Missing12048
Missing (%)4.5%
Infinite0
Infinite (%)0.0%
Mean1.1579326887953871
Minimum0.0
Maximum273.708955847191
Zeros78999
Zeros (%)29.8%
Memory size2.0 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.5228240342
Q31.563885299
95-th percentile4.135172947
Maximum273.7089558
Range273.7089558
Interquartile range (IQR)1.563885299

Descriptive statistics

Standard deviation2.296314677
Coefficient of variation (CV)1.983115858
Kurtosis2950.186933
Mean1.157932689
Median Absolute Deviation (MAD)1.13210345
Skewness34.17882096
Sum292828.4397
Variance5.273061098
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 78999 29.8%
 
0.5208641808 2363 0.9%
 
0.7812962712 2261 0.9%
 
0.2604320904 2231 0.8%
 
0.2606475499 2033 0.8%
 
0.2611202188 1992 0.8%
 
1.041728362 1940 0.7%
 
0.2600073888 1921 0.7%
 
0.2611377503 1883 0.7%
 
0.5200147776 1801 0.7%
 
Other values (15109) 155465 58.7%
 
(Missing) 12048 4.5%
 
ValueCountFrequency (%) 
0 78999 29.8%
 
0.2543025298 1 < 0.1%
 
0.2544061069 1 < 0.1%
 
0.2545720059 1 < 0.1%
 
0.2545927585 1 < 0.1%
 
ValueCountFrequency (%) 
273.7089558 1 < 0.1%
 
246.4349729 1 < 0.1%
 
233.8691658 1 < 0.1%
 
210.3116422 1 < 0.1%
 
209.9314216 1 < 0.1%
 

CO (ppm)
Real number (ℝ≥0)

MISSING
ZEROS
Distinct count445
Unique (%)0.2%
Missing8464
Missing (%)3.2%
Infinite0
Infinite (%)0.0%
Mean0.37983881344235076
Minimum0.0
Maximum7.66
Zeros11044
Zeros (%)4.2%
Memory size2.0 MiB

Quantile statistics

Minimum0
5-th percentile0.01
Q10.17
median0.33
Q30.53
95-th percentile0.9
Maximum7.66
Range7.66
Interquartile range (IQR)0.36

Descriptive statistics

Standard deviation0.3133495106
Coefficient of variation (CV)0.8249539002
Kurtosis23.10958797
Mean0.3798388134
Median Absolute Deviation (MAD)0.2248706058
Skewness2.804800954
Sum97418.4
Variance0.09818791578
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 11044 4.2%
 
0.01 4667 1.8%
 
0.2 4051 1.5%
 
0.21 4031 1.5%
 
0.26 4025 1.5%
 
0.24 4008 1.5%
 
0.23 4007 1.5%
 
0.17 3979 1.5%
 
0.28 3971 1.5%
 
0.25 3967 1.5%
 
Other values (435) 208723 78.8%
 
(Missing) 8464 3.2%
 
ValueCountFrequency (%) 
0 11044 4.2%
 
0.01 4667 1.8%
 
0.02 3475 1.3%
 
0.03 3037 1.1%
 
0.04 2780 1.0%
 
ValueCountFrequency (%) 
7.66 1 < 0.1%
 
6.26 1 < 0.1%
 
6.09 1 < 0.1%
 
6.05 1 < 0.1%
 
5.99 1 < 0.1%
 

$O_3$ (µg/m3)
Real number (ℝ≥0)

MISSING
Distinct count63252
Unique (%)24.9%
Missing10403
Missing (%)3.9%
Infinite0
Infinite (%)0.0%
Mean14.592120864219355
Minimum0.0
Maximum151.80936261525568
Zeros1117
Zeros (%)0.4%
Memory size2.0 MiB

Quantile statistics

Minimum0
5-th percentile2.175734911
Q17.307514244
median12.87810793
Q319.95511802
95-th percentile32.99522494
Maximum151.8093626
Range151.8093626
Interquartile range (IQR)12.64760378

Descriptive statistics

Standard deviation9.640177003
Coefficient of variation (CV)0.660642623
Kurtosis1.709970747
Mean14.59212086
Median Absolute Deviation (MAD)7.575743751
Skewness1.058608995
Sum3714190.892
Variance92.93301266
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 1117 0.4%
 
19.36820278 265 0.1%
 
18.97692596 248 0.1%
 
17.41181867 242 0.1%
 
17.21618025 237 0.1%
 
20.93331008 234 0.1%
 
18.58564914 233 0.1%
 
18.39001072 230 0.1%
 
19.75947961 229 0.1%
 
18.78128755 228 0.1%
 
Other values (63242) 251271 94.8%
 
(Missing) 10403 3.9%
 
ValueCountFrequency (%) 
0 1117 0.4%
 
0.1915285258 1 < 0.1%
 
0.1918425842 1 < 0.1%
 
0.1920630387 1 < 0.1%
 
0.1921892402 1 < 0.1%
 
ValueCountFrequency (%) 
151.8093626 1 < 0.1%
 
108.2943748 1 < 0.1%
 
103.7071195 1 < 0.1%
 
100.9446188 1 < 0.1%
 
98.73216077 1 < 0.1%
 

$PM_{10}$ (µg/m3)
Real number (ℝ≥0)

MISSING
Distinct count1665
Unique (%)0.7%
Missing14719
Missing (%)5.6%
Infinite0
Infinite (%)0.0%
Mean25.336687608405477
Minimum0.0
Maximum969.4
Zeros19
Zeros (%)< 0.1%
Memory size2.0 MiB

Quantile statistics

Minimum0
5-th percentile5.7
Q114
median22.2
Q332.4
95-th percentile54.6
Maximum969.4
Range969.4
Interquartile range (IQR)18.4

Descriptive statistics

Standard deviation17.90939019
Coefficient of variation (CV)0.7068560209
Kurtosis126.2346366
Mean25.33668761
Median Absolute Deviation (MAD)12.09399281
Skewness5.306224199
Sum6339695.3
Variance320.7462568
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
19.4 818 0.3%
 
18.6 808 0.3%
 
18 807 0.3%
 
19.1 801 0.3%
 
20.8 799 0.3%
 
17.3 797 0.3%
 
20 796 0.3%
 
22.7 795 0.3%
 
16.3 794 0.3%
 
15.6 792 0.3%
 
Other values (1655) 242211 91.4%
 
(Missing) 14719 5.6%
 
ValueCountFrequency (%) 
0 19 < 0.1%
 
0.1 42 < 0.1%
 
0.2 34 < 0.1%
 
0.3 33 < 0.1%
 
0.4 35 < 0.1%
 
ValueCountFrequency (%) 
969.4 1 < 0.1%
 
878.4 1 < 0.1%
 
832.9 1 < 0.1%
 
656.8 1 < 0.1%
 
598 1 < 0.1%
 

$NO_2$ (µg/m3)
Real number (ℝ≥0)

MISSING
Distinct count111995
Unique (%)43.7%
Missing8924
Missing (%)3.4%
Infinite0
Infinite (%)0.0%
Mean21.15237698896905
Minimum0.0
Maximum268.6440491107547
Zeros217
Zeros (%)0.1%
Memory size2.0 MiB

Quantile statistics

Minimum0
5-th percentile4.929146911
Q112.58186922
median19.41337046
Q327.73208362
95-th percentile42.75497444
Maximum268.6440491
Range268.6440491
Interquartile range (IQR)15.1502144

Descriptive statistics

Standard deviation12.08765172
Coefficient of variation (CV)0.5714559514
Kurtosis4.257981515
Mean21.15237699
Median Absolute Deviation (MAD)9.28864023
Skewness1.234783773
Sum5415283.49
Variance146.111324
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 217 0.1%
 
16.31439814 149 0.1%
 
17.06448541 141 0.1%
 
19.46565993 139 0.1%
 
17.62705086 134 0.1%
 
16.12687632 133 0.1%
 
17.43952905 131 < 0.1%
 
15.37678905 130 < 0.1%
 
13.68909269 129 < 0.1%
 
20.25235631 127 < 0.1%
 
Other values (111985) 254583 96.1%
 
(Missing) 8924 3.4%
 
ValueCountFrequency (%) 
0 217 0.1%
 
0.01875092284 46 < 0.1%
 
0.03750184567 38 < 0.1%
 
0.05625276851 14 < 0.1%
 
0.07500369135 21 < 0.1%
 
ValueCountFrequency (%) 
268.6440491 1 < 0.1%
 
204.2906119 1 < 0.1%
 
165.6963269 1 < 0.1%
 
157.4632923 1 < 0.1%
 
153.2053251 1 < 0.1%
 

NO (µg/m3)
Real number (ℝ≥0)

MISSING
Distinct count154118
Unique (%)60.2%
Missing9004
Missing (%)3.4%
Infinite0
Infinite (%)0.0%
Mean37.58717839929262
Minimum0.0
Maximum684.9544090228212
Zeros1078
Zeros (%)0.4%
Memory size2.0 MiB

Quantile statistics

Minimum0
5-th percentile1.469860324
Q19.339641486
median23.8082045
Q347.04165279
95-th percentile130.9588906
Maximum684.954409
Range684.954409
Interquartile range (IQR)37.7020113

Descriptive statistics

Standard deviation46.0990583
Coefficient of variation (CV)1.226457006
Kurtosis12.06897714
Mean37.5871784
Median Absolute Deviation (MAD)29.7755884
Skewness2.967953558
Sum9619799.329
Variance2125.123176
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 1078 0.4%
 
0.9767468513 182 0.1%
 
0.8546534949 174 0.1%
 
0.6104667821 172 0.1%
 
1.220933564 162 0.1%
 
1.465120277 161 0.1%
 
0.4883734257 159 0.1%
 
1.587213633 157 0.1%
 
0.7325601385 154 0.1%
 
1.343026921 153 0.1%
 
Other values (154108) 253381 95.6%
 
(Missing) 9004 3.4%
 
ValueCountFrequency (%) 
0 1078 0.4%
 
0.01223147655 5 < 0.1%
 
0.01231627782 1 < 0.1%
 
0.01235371964 1 < 0.1%
 
0.01238299881 1 < 0.1%
 
ValueCountFrequency (%) 
684.954409 1 < 0.1%
 
647.7467967 1 < 0.1%
 
560.8769782 1 < 0.1%
 
545.6961459 1 < 0.1%
 
545.658232 1 < 0.1%
 

station
Categorical

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
PARALELA-CAB
43824
DIQUE DO TORORÓ
39768
RIO VERMELHO
39730
CAMPO GRANDE
39452
PIRAJÁ
37968
Other values (3)
64195
ValueCountFrequency (%) 
PARALELA-CAB 43824 16.5%
 
DIQUE DO TORORÓ 39768 15.0%
 
RIO VERMELHO 39730 15.0%
 
CAMPO GRANDE 39452 14.9%
 
PIRAJÁ 37968 14.3%
 
AV ACM - DETRAN 26085 9.8%
 
ITAIGARA 19308 7.3%
 
AV BARROS REIS 18802 7.1%
 

Length

Max length15
Mean length11.7362505
Min length6
ValueCountFrequency (%) 
Uppercase_Letter 22 91.7%
 
Dash_Punctuation 1 4.2%
 
Space_Separator 1 4.2%
 
ValueCountFrequency (%) 
Latin 22 91.7%
 
Common 2 8.3%
 
ValueCountFrequency (%) 
ASCII 22 100.0%
 

lat
Real number (ℝ)

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-12.969651704624921
Minimum-13.005500404304225
Maximum-12.898903466026768
Zeros0
Zeros (%)0.0%
Memory size2.0 MiB

Quantile statistics

Minimum-13.0055004
5-th percentile-13.0055004
Q1-12.98973907
median-12.98371943
Q3-12.95380924
95-th percentile-12.89890347
Maximum-12.89890347
Range0.1065969383
Interquartile range (IQR)0.03592983707

Descriptive statistics

Standard deviation0.03311829317
Coefficient of variation (CV)-0.00255352217
Kurtosis0.2492472999
Mean-12.9696517
Median Absolute Deviation (MAD)0.02628537438
Skewness1.146610611
Sum-3436140.614
Variance0.001096821342
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-13.0055004 -12.99233172 -12.98672925 -12.98086074 -12.97112677 -12.95903037 -12.92635635 -12.89890347], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-12.95380924 43824 16.5%
 
-12.98371943 39768 15.0%
 
-13.0055004 39730 15.0%
 
-12.98973907 39452 14.9%
 
-12.89890347 37968 14.3%
 
-12.97800204 26085 9.8%
 
-12.99492436 19308 7.3%
 
-12.9642515 18802 7.1%
 
ValueCountFrequency (%) 
-13.0055004 39730 15.0%
 
-12.99492436 19308 7.3%
 
-12.98973907 39452 14.9%
 
-12.98371943 39768 15.0%
 
-12.97800204 26085 9.8%
 
ValueCountFrequency (%) 
-12.89890347 37968 14.3%
 
-12.95380924 43824 16.5%
 
-12.9642515 18802 7.1%
 
-12.97800204 26085 9.8%
 
-12.98371943 39768 15.0%
 

lon
Real number (ℝ)

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-38.477907259929516
Minimum-38.520087964258316
Maximum-38.4283765135188
Zeros0
Zeros (%)0.0%
Memory size2.0 MiB

Quantile statistics

Minimum-38.52008796
5-th percentile-38.52008796
Q1-38.50698772
median-38.4793294
Q3-38.45784983
95-th percentile-38.42837651
Maximum-38.42837651
Range0.09171145074
Interquartile range (IQR)0.04913788754

Descriptive statistics

Standard deviation0.02961041118
Coefficient of variation (CV)-0.0007695431818
Kurtosis-0.9196228844
Mean-38.47790726
Median Absolute Deviation (MAD)0.024273613
Skewness0.244296259
Sum-10194221.32
Variance0.0008767764503
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-38.52008796 -38.49708083 -38.48325167 -38.47734722 -38.47214644 -38.46338883 -38.44311317 -38.42837651], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-38.42837651 43824 16.5%
 
-38.50698772 39768 15.0%
 
-38.48717394 39730 15.0%
 
-38.52008796 39452 14.9%
 
-38.45784983 37968 14.3%
 
-38.46892783 26085 9.8%
 
-38.47536505 19308 7.3%
 
-38.4793294 18802 7.1%
 
ValueCountFrequency (%) 
-38.52008796 39452 14.9%
 
-38.50698772 39768 15.0%
 
-38.48717394 39730 15.0%
 
-38.4793294 18802 7.1%
 
-38.47536505 19308 7.3%
 
ValueCountFrequency (%) 
-38.42837651 43824 16.5%
 
-38.45784983 37968 14.3%
 
-38.46892783 26085 9.8%
 
-38.47536505 19308 7.3%
 
-38.4793294 18802 7.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

Date & Time$SO_2$ (µg/m3)CO (ppm)$O_3$ (µg/m3)$PM_{10}$ (µg/m3)$NO_2$ (µg/m3)NO (µg/m3)stationlatlon
02013-09-01 04:00:00NaN0.12NaNNaN16.62473020.604612AV ACM - DETRAN-12.978002-38.468928
12013-09-01 05:00:003.6197740.1514.141319NaN21.90874923.132639AV ACM - DETRAN-12.978002-38.468928
22013-09-01 06:00:001.5567450.1412.052343NaN20.49467216.164273AV ACM - DETRAN-12.978002-38.468928
32013-09-01 07:00:000.7787610.0810.69692613.426.46995317.996282AV ACM - DETRAN-12.978002-38.468928
42013-09-01 08:00:001.0386930.126.03118214.030.39462940.748375AV ACM - DETRAN-12.978002-38.468928
52013-09-01 09:00:000.7796030.093.69929911.229.29773347.108748AV ACM - DETRAN-12.978002-38.468928
62013-09-01 10:00:000.5199080.113.89529511.130.24084737.504716AV ACM - DETRAN-12.978002-38.468928
72013-09-01 11:00:000.2602140.088.57821817.820.7412978.776102AV ACM - DETRAN-12.978002-38.468928
82013-09-01 12:00:001.5640210.0514.64759218.115.9108153.418911AV ACM - DETRAN-12.978002-38.468928
92013-09-01 13:00:001.5659820.0716.23032215.79.5584560.000000AV ACM - DETRAN-12.978002-38.468928

Last rows

Date & Time$SO_2$ (µg/m3)CO (ppm)$O_3$ (µg/m3)$PM_{10}$ (µg/m3)$NO_2$ (µg/m3)NO (µg/m3)stationlatlon
2649272015-12-31 15:00:001.3000370.1011.2986706.819.75393026.965079RIO VERMELHO-13.0055-38.487174
2649282015-12-31 16:00:000.7800220.1011.10386519.719.92196927.233024RIO VERMELHO-13.0055-38.487174
2649292015-12-31 17:00:001.3000370.1010.32464613.921.26628233.639362RIO VERMELHO-13.0055-38.487174
2649302015-12-31 18:00:000.7800220.1212.66230219.419.69791726.867644RIO VERMELHO-13.0055-38.487174
2649312015-12-31 19:00:000.2600070.1013.24671620.621.86375427.732378RIO VERMELHO-13.0055-38.487174
2649322015-12-31 20:00:000.5200150.0813.63632518.824.60839327.659302RIO VERMELHO-13.0055-38.487174
2649332015-12-31 21:00:000.2600070.1014.22073929.523.17072523.311274RIO VERMELHO-13.0055-38.487174
2649342015-12-31 22:00:000.5200150.2015.77917727.223.07737022.580513RIO VERMELHO-13.0055-38.487174
2649352015-12-31 23:00:000.2600070.2214.41554423.622.47989725.125997RIO VERMELHO-13.0055-38.487174
2649362015-12-31 00:00:000.0000000.1717.14280923.220.05266617.696594RIO VERMELHO-13.0055-38.487174